Why effectively addressing the 'number of factors question' matters during EFA: Possible consequences of over- versus under-factoring

A key aim in exploratory factor analysis (EFA) is to identify factors that will reproduce the observed correlations among a set of measured variables or indicators. Since reproduction of the correlation matrix can be always be increased by extracting more factors, the analyst's job is to identify a limited number of factors that 'contain the maximum amount of information' (Gorsuch, 1983, p. 143). By selecting a 'limited number' the analyst reduces the possibility of inclusion of trivial factors of low informational value and lower reliability. 

In my previous post, I mentioned that one of the more fundamental decision a factor analyst has to make during EFA is the number of factors that account for (reproduce) the correlations among a set of measured variables. Clark and Bowles (2018) state that there are two major kinds of errors that can be made in the context of EFA when making this determination. The first is identifying too few latent dimensions, resulting in underfactoring of the correlation matrix. The second error involves identifying too many latent dimensions, leading to overfactoring. Both under- or over-extraction leads to distortion of EFA results (Zwick & Velicer, 1986). According to Fabrigar et al. (1999), researchers have largely agreed that underfactoring is a more severe error than overfactoring, although analysts should also strive to avoid overfactoring as well.  

O'Connor (2000) provided a helpful description of the effects of these two errors on the factor solution itself and on interpretation of results by the analyst:

'Under-extraction compresses variables into a small factor space, resulting in a loss of important  information, a neglect of potentially important factors, a distorted fusing of two or more factors, and an increase in error in the loadings. Over-extraction diffuses variables across a large factor space, potentially resulting in factor splitting, in factors with few high loadings, and in researchers' attributing excessive substantive importance to trivial factors' (p. 396). 

Echoing O'Connor (2000), Fabrigar et al. (1999) noted that when under-extraction occurs, measured variables that would have loaded onto factors that are not included in the model may end up loading incorrectly onto those that are. Indeed, two different factors could end up being combined into a single factor following rotation, clouding one's understanding of the true factorial structure. An additional problem with under-extraction is that the loadings of indicators that are correctly associated with their factors in the model may become distorted. Speaking to the effects of overfactoring, Hayton, Allen, and Scarpello (2004) stated that this can result in 'focusing on minor factors at the expense of major ones, the creation of factors with only one high loading, factors that are difficult to interpret, and factors that are unlikely to replicate' (p. 192). Moreover, over-factoring can result in improper solutions (Velicer & Fava, 1998) or 'nonsensical values' (Cooper & Waller, 2023). One such improper solution is a Heywood case, where the unique variance of one or more variables is less than or equal to zero (Dillon et al., 1987).

Now that I have addressed some of the potential consequences of over- versus under-factoring, I now turn to a review of several approaches to addressing the 'number of factors question'. In my review, I also attempt to give you some insight into the 'behavior' of some of these methods and provide some indication of how likely the method might lead to determining too many or too few factors. 

Approaches to estimating number of factors and potential implications for decision accuracy

Eigenvalue cutoff rule of 1 (Kaiser criterion, or K1 rule): This method involves extraction of a set of principal components from an unreduced correlation matrix (Finch & West, 1997) and forming a judgement of the number of factors to retain based on the number of components that have eigenvalues greater than 1 (Kaiser, 1960). This method is well-known to lead to overestimation of the number of factors (Hayton et al., 2004; Lance et al., 2006; Velicer & Fava, 1998). For example, in a seminal study by Zwick and Velicer (1986), the K1 rule was the least accurate approach (of several that were tested) to determining factors. They found this rule consistently overfactored. [Fabrigar et al. (1999) noted it can lead to underfactoring as well.] More recently, Pearson et al. (2013) found that the K1 rule resulted in a tendency to overfactor. Fabrigar et al. (1999) and Fabrigar and Wegener (2012) criticized due to the apparent arbitrariness in deciding to retain a factor with an eigenvalue of 1.01 and to not retain an eigenvalue that is .99. Even as far back as 1978, Comrey noted the influence of computer defaults (to the K1 rule) as strongly contributing to its popularity and bemoaned that its application 'requires no judgment on the part of the investigator' (p. 651). Pearson et al. (2013) observed the K1 rule has remained the default method in many software programs  and that it 'seems likely that researchers often interpret more factors than are truly present in their data; perhaps many more' (p. 13). In general, statisticians agree that the K1 rule is not a substantive basis for deciding on number of factors (even if it is pre-programmed into your statistical package!). 

Catell's (1966) scree test: This method involves plotting eigenvalues from an unreduced correlation matrix against component number and then examining where there are 'breaks'. The researcher decides to retain only those factors that have eigenvalues that have not 'leveled off'. Zwick and Velicer (1986) commented (based on their simulation research) that the scree test 'was more accurate and less variable than either the K1 method or Bartlett's test' (p. 440). In their study, the accuracy rate of the scree test in terms of number of factors was 57%, with 90% of decision errors reflecting overestimation of the number of factors. Finally, despite using two trained raters in their study, Zwick and Velicer described the reliability of those rating as 'moderate' and 'very problematic for the applied researcher'. Nevertheless, Finch and West (1997) suggested the scree method is 'reasonably accurate', providing correct estimates or slightly overestimates for the number of factors. Hayton et al. (2004) provided a more qualified assessment on the scree test, suggesting the it well with strong factors, but becomes more problematic with minor factors, when there are two or more breaks, or there are no obvious breaks. In general, the subjectivity associated with the scree test appears to be a key limitation of the procedure [Note: Fabrigar and Wegener (2012) suggest the scree test rely on eigenvalues from a reduced correlation matrix.] My perspective based on the literature is the scree test may be used as a complement to other methods of determining the number of factors. However, it should not form the sole or primary basis for your determination.

Parallel analysis (PA): This method is regarded as perhaps the strongest of the statistically-based approaches to deciding on number of factors. Horn's (1965) method uses eigenvalues from the unreduced correlation matrix as a basis of comparison against the eigenvalues randomly generated during PA. In effect, the eigenvalues from a PCA extraction (see Tabachnick & Fidell, 2013) are compared against the mean or 95th percentile of eigenvalues generated from correlation matrices under the condition of orthogonality. Using this method, the number of factors one should retain is equal to the number of components with observed eigenvalues that exceed their corresponding randomly generated eigenvalues. In a seminal study by Zwick and Velicer's (1986), these authors found Horn's PA outperformed all other methods (eigenvalue cutoff rule; scree test, Bartlett's chi-square, and the MAP test) when determining the number of factors. 

Various authors (e.g., Humphrys & Iigen,1969; Fabrigar & Wegener, 2012) have suggested PA should be performed using a reduced correlation matrix (with squared multiple correlations in the principal diagonal) instead of an unreduced correlation matrix. From this suggestion, a debate has ensued as to whether one approach clearly outperforms than the other when determining number of factors. As a result, various simulation studies have been carried out to address this question. Research by Crawford et al. (2010) suggested the performance of Horn's PA (which relies on the unreduced correlation matrix) and PA using a reduced correlation matrix (with squared multiple correlations in the diagonal & principal axis factoring) may depend on number of factors and the level of correlation between them. Timmerman and Lorenzo-Seva (2011) pitted parallel analysis (PA) methods using the unreduced and reduced correlation matrix in their study and found evidence that parallel analysis using the reduced matrix consistently overfactored. Although they found PA during their minimum rank factor analysis performed best in their study, they indicated the PCA-based method still functioned well. Moreover, they concluded parallel analysis using principal axis factoring of the reduced correlation matrix should be 'discouraged'.

Across two studies by Pearson et al. (2013), the authors found that PA using reduced correlation matrices outperformed PA performed on unreduced matrices. Lim and Jahng's (2019) simulation study, on the other hand, suggested Horn's PA on an unreduced correlation matrix may generally be a better approach than PA on a reduced matrix, 'especially when the population factor structure has model error or trivial factors' (p. 14). At this point, my sense of the literature is that the issue of whether one should use PA with an unreduced correlation matrix or a reduced correlation matrix remains largely unresolved at this time. There does appear to be indication that parallel analysis using the reduced correlation matrix can lead to over-factoring in some cases. Nevertheless, I find it difficult to make any strong recommendation on using either approach. I hope to provide more clarification on this issue in future postings. 

Velicer's Minimum Average Partial (MAP) method: The MAP method is another one of the preferred methods to use when making decisions regarding the number of factors. The method involves a series of component extractions from the unreduced correlation matrix and calculations of average squared (partial) correlations. The component solution associated with the smallest average squared partial correlation is the preferred solution and the basis on which to estimate the number of factors [see O'Connor's (2000)] description. Zwick and Velicer (1986) noted that the procedure was generally more accurate and less variable in detecting number of factors than the eigenvalue cutoff rule of 1 and the scree plot, although it still has a tendency to underestract factors. 

According to O'Connor (2000), 'The two procedures [PA and MAP] complement each other nicely, in that the MAP tends to err (when it does err) in the direction of underextraction, whereas parallel analysis tends to err (when it does err) in the direction of overextraction. Optimal decisions are thus likely to be made after the results of both analytic procedures have been considered' (p. 398).

Side note: It is unfortunate that both PA and the MAP method are not built directly into SPSS. However, it is possible to use these methods using syntax written by Brian O'Connor. For a demo of the MAP syntax, go here: Determining number of factors in SPSS using Velicer's MAP Procedure

Maximum likelihood (ML) method: This method essentially involves estimating a series of factor models of increasing complexity (in terms of number of factors, evaluating the fit of each of those models, and then comparing them in terms of relative fit. The more traditional approach is to compute the chi-square goodness of fit test in a series of nested factor models and use this test as a basis for deciding on number of factors. The reasoning goes that models involving underextraction of factors will be statistically significant (leading them to fail in terms of adequate reproduction of the correlation matrix). The model where the chi-square test becomes non-significant will be the preferred one (in terms of number of factors). 

There are several downsides to this method. First, ML estimation makes the assumption that the measured variables are multivariate normally distributed. When this assumption is violated parameter estimates remain unbiased but model fit statistics (including the chi-square test) will suggest a worse fitting model (West, Finch, & Curran, 1995). [It also has the impact of decreasing standard errors leading to Type 1 error for tests of parameter estimates. However, these are typically not included in standard EFA applications] Thus, in more egregious cases where multivariate normality cannot be assumed, we might expect EFA using ML to encourage overfactoring. A second limitation of using the chi-square test is the probability rejecting an appropriate factor model is heavily impacted by sample size (Fabrigar & Wegener, 2012). Since EFA is generally performed on large samples, one could easily reject a model that does a good job of reproducing the sample correlation. That is, even when the discrepancy between it and the observed correlation matrix is very low, the influence of sample size leads to a statistically significant chi-square test result. Third, Fabrigar et al. (1999) note that ML-based extraction has a greater likelihood of having an improper solution (i.e., Heywood case or failure to converge) than using principal axis factoring. The presence of improper solutions will obviate the use of the test. Finally, Fabrigar and Wegener (2012) mention that the chi-square test is a test of a potentially unrealistic hypothesis - i.e., exact fit of the model to the data. Rejection of the exact fit hypothesis does not necessarily mean that the model fails to exhibit close fit to the data, making the model potentially useful. Due to the second and last issues raised above, methodologists often encourage the use of descriptive fit indices (e.g., RMSEA, CFI, SRMR, etc.) when comparing evaluating the fit of models (Fabrigar & Wegener, 2012). 

*******************************************************
I discuss the above and other methods in a video on determining number of factors using RStudio with the EFA.dimensions package.


Links to file referenced in video:

*******************************************************

Other considerations when deciding on number of factors

Factor meaningfulness and minimum number of measured variables per factor: Fabrigar and Wegener (2012) argue the usefulness of a factor model ultimately rests on its ability to 'provide a conceptually sensible representation of the data' (p. 65). In those cases where there is evidence of conflicting findings using a subset (or all) of the methods described above, it is reasonable to evaluate each of the competing models for their interpretability and theoretical utility and then choose the model that best meets these criteria. When doing this, you should interpret the rotated factor loadings (Comrey, 1978). Since determination of a final factor model cannot be reduced simply to a set of 'mechanical rules and procedures', it is important for you to have a strong background in your area of research (this includes both theory and previous empirical findings). 

With respect to the minimum number of items/indicators/measured variables per factor, methodologists generally consider 3-5 to be appropriate (Fabrigar et al., 1999), with three being the absolute minimum (Hahs-Vaughn, 2016). If you have considering a factor model that has fewer than three nontrivial loadings (i.e., loadings failing to meet your minimum loading criterion; methodologists have proposed minimums ranging anywhere between .30 to .40 in absolute value), this may suggest the need to consider a different model with fewer factors.    

Examination of residual matrix: The assessment of the fit of a model with a certain number of factors can be partially evaluated by examining the values in the residual correlation matrix, where the values in the off diagonal are differences between the corresponding elements in the observed correlation matrix and the reproduced (or model-implied) correlation matrix. The expectation (in the case of an exact-fitting model) is that the off diagonal elements will be zero. Ideally, most of the (non-redundant) residuals in the matrix would be near zero (Hahs-Vaughn, 2016) with the greatest proportion less than .05 (Watkins, 2021). Residuals greater than .10 can be regarded as evidence of more substantial misfit (Kline, 2016; Watkins, 2021). Treating the residual correlation matrix as an indicator of fit, it is reasonable to compare the matrices of factor models containing different numbers of factors when addressing the 'number of factors' question.  

Use multiple approaches to factor determinationGiven the limitations associated with each of the alternatives for deciding on number of factors described above, a wise choice when deciding on the number of factors is to utilize multiple procedures that are better performing. This is consistent with Comrey (1978) and  Fabrigar & Wegener's (2012) encouragement of a more 'holistic approach' to factor determination


References

Clark, D. A., & Bowles, R. P. (2018). Model fit and item factor analysis: Overfactoring, underfactoring, and a program to guide interpretation. Multivariate Behavioral Research, 53(4), 544–558.

Comrey, A. L. (1978). Common methodological problems in factor analytic studies. Journal of Consulting and Clinical Psychology, 46, 648-659.

Crawford, A. V., Green, S. B., Levy, R., Lo, W. J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901.

Fabrigar, L. R., & Wegener, D. T. (2012). Exploratory factor analysis. Oxford University Press.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299.

Finch, J.F. and West, S.G. (1997) The investigation of personality structure: Statistical models. Journal of Research in Personality, 31, 439-485.

Gorsuch, R. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Hahs-Vaugn, D. L. (2016). Applied multivariate statistical concepts. New York: Routledge.

Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7(2), 191–205. 

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185.

Humphreys, L. G., & Ilgen, D. R. (1969). Note on a criterion for the number of common factors. Educational and Psychological Measurement, 29(3), 571–578.

Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). Guilford Press.

Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: what did they really say? Organizational Research Methods, 9(2), 202–220.

Lim, S., & Jahng, S. (2019). Determining the number of factors using parallel analysis and its recent variants. Psychological Methods, 24(4), 452–467.

O'Connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test. Behavior Research Methods, Instruments & Computers, 32(3), 396–402.

Pearson, R., Mundfrom, D., & Piccone, A. (2013). A comparison of ten methods for determining the number of factors in exploratory factor analysis. Multiple Linear Regression Viewpoints, 39link to article 

Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Upper Saddle River: NJ: Pearson. 

Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16(2), 209–220.

Velicer, W. F., & Fava, J. L. (1998). Affects of variable and subject sampling on factor pattern recovery. Psychological Methods, 3(2), 231–251.

Watkins, M. W. (2021). A step-by-step guide to exploratory factor analysis with SPSS. New York: Routledge.

West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56–75). Sage Publications, Inc.

Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432–442. 





Comments

Popular posts from this blog

Factor analysis of EBI items: Tutorial with RStudio and EFA.dimensions package

Process model 7 using Hayes Process macro with RStudio

Multilevel path analysis in lavaan using RStudio